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I. Introduction 

With the advent of high-throughput data collection techniques, low-dimensional matrix factorizations 
have become an essential tool for pre-processing, interpreting or compressing high-dimensional data. They 
are widely used in a variety of signal processing domains including electrocardiogram [1 ], image 0, or 
sound [3] processing. These methods can take advantage of a large range of a priori knowledge on the 
form of the factors, enforcing it through constraints on sparsity or patterns in the factors. However, these 
methods do not work well when there are unknown misalignments between subjects in the population, 
e.g., unknown subject-specific time shifts. In such cases, one cannot apply standard patterning constraints 
without first aligning the data; a difficult task. An alternative approach, explored in this paper, is to impose 
a factorization constraint that is invariant to factor misalignments but preserves the relative ordering of 
the factors over the population. This order-preserving factor analysis is accomplished using a penalized 
least squares formulation using shift-invariant yet order-preserving model selection (group lasso) penalties 
on the factorization. As a byproduct the factorization produces estimates of the factor ordering and the 
order-preserving time shifts. 

In traditional matrix factorization, the data is modeled as a linear combination of a number of factors. 
Thus, given annxp data matrix X, the Linear Factor model is defined as: 

X = MA + e, (1) 

where M is a n x / matrix of factor loadings or dictionary elements, A is a / x p matrix of scores (also 
called coordinates) and e is a small residual. For example, in a gene expression time course analysis, n is 
the number of time points and p is the number of genes in the study, the columns of M contain the features 
summarizing the genes' temporal trajectories and the columns of A represent the coordinates of each gene 
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on the space spanned by M. Given this model, the problem is to find a parsimonious factorization that 
fits the data well according to selected criteria, e.g. minimizing the reconstruction error or maximizing 
the explained variance. There are two main approaches to such a parsimonious factorization. One, called 
Factor Analysis, assumes that the number of factors is small and yields a low-rank matrix factorization 
fl4ll . 151 . The other, called Dictionary Learning ||7] or Sparse Coding JSJ, assumes that the loading 
matrix M comes from an overcomplete dictionary of functions and results in a sparse score matrix A. 
There are also hybrid approaches such as Sparse Factor Analysis CQ, QD, OH that try to enforce low rank 
and sparsity simultaneously. 

In many situations, we observe not one but several matrices X s , s = 1, • • • ,S and there are physical 
grounds for believing that the X s 's share an underlying model. This happens, for instance, when the 
observations consist of different time-blocks of sound from the same music piece (3j , ifTOl , when they 
consist of time samples of gene expression microarray data from different individuals inoculated with 
the same virus iTTTI . or when they arise from the reception of digital data with code, spatial and temporal 
diversity |[T2l . In these situations, the fixed factor model £[)) is overly simplistic. 

An example, which is the main motivation for this work is shown in Figure Q] which shows the effect 
of temporal misalignment across subjects in a viral challenge study reported in JTTJ. Figure Q] shows the 
expression trajectory for a particular gene that undergoes an increase (up-regulation) after viral inoculation 
at time 0, where the moment when up-regulation occurs differs over the population. Training the model 
(OQ) on this data will produce poor fit due to misalignment of gene expression onset times. 
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Fig. 1. Example of temporal misalignment across subjects of upregulated gene CCRL2. Subject 6 and subject 10 show the 
earliest and the latest up-regulation responses, respectively. 

A more sensible approach for the data in Figure [T] would be to separately fit each subject with a 
translated version of a common up-regulation factor. This motivates the following extension of model 
fl}, where the factor matrices M s , A s are allowed to vary across observations. Given a number S of 
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n x p data matrices X s , we let: 

X s = M s A s + e s s = I,-- - ,S. (2) 

Following the gene expression example, here n is the number of time points, p is the number of genes 
in the study, and S is the number of subjects participating in the study. Hence, the n x / matrices M s 
contain the translated temporal features corresponding to the s-th subject and the / x p matrices A s 
accommodate the possibility of subjects having different mixing weights. For different constraints on 
M s , A s , this model specializes to several well-known paradigms such as Principal Components Analysis 
(PCA) HI, sparse PCA Q, k-SVD [6], structured PCA (2, Non-Negative Matrix Factorization (NNMF) 
lfl3l . Maximum-Margin Matrix Factorization (MMMF) [14], Sparse Shift-invariant models [3|, Parallel 
Factor Analysis (PARAFAC) @, US or Higher-Order SVD (HOSVD) [16]. Table HI summarizes the 
characteristics of these decomposition models when seen as different instances of the general model d2j. 

TABLE I 

Special cases of the general model I©. 



Decomposition 


Structure of M s 


Structure of A s 


Uniqueness 


Reference 


PCA 


Orthogonal M s = F 


Orthogonal A s 


Yes 


SVD 


Sparse-PCA 


Sparse M s = F 


Sparse A s 


No 


Sparse PCA ifTl, fTTl, 
k-SVD |6|, PMD (H 


Structured-PCA 


M a — F 


Structured Sparse A s 


No 


□ 


NNMF 


Non-negative M s = F 


Non-negative A s 


No 


CD 


Sparse 
Shift-invariant 
models 


M 3 = [M(F,di)---M(F,d D )] 
where {dj}®_ 1 are all possible 
translations of the n-dimensional vectors in F. 


Sparse A s 


No 


ED, 
CD, 01 


PARAFAC/CP 


Ms — F 


As = diag (C., s ) B' 


Yes 


ED 


HOSVD 


Orthogonal M s = F 


As = (G x 3 C.,,)B' 
where slices of Q 
are orthogonal 


Yes 


ED 


OPFA 


Ms =M{F,d a ), d s €K 
where F is smooth 
and non-negative and 
K, enforces consistent 
precedence order 


Non-negative, 
sparse A s 


No 


This work. 



In this paper, we will restrict the columns of M s to be translated versions of a common set of factors, 
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where these factors have onsets that occur in some relative order that is consistent across all subjects. 
Our model differs from previous shift-invariant models considered in [18], Q, ifTOI in that it restricts the 
possible shifts to those which preserve the relative order of the factors among different subjects. We call 
the problem of finding a decomposition d2) under this assumption the Order Preserving Factor Analysis 
(OPFA) problem. 

The contributions of this paper are the following. First, we propose a non-negatively constrained 
linear model that accounts for temporally misaligned factors and order restrictions. Second, we give a 
computational algorithm that allows us to fit this model in reasonable time. Finally, we demonstrate that 
our methodology is able to succesfully extract the principal features in a simulated dataset and in a real 
gene expression dataset. In addition, we show that the application of OPFA produces factors that can be 
used to significantly reduce the variability in clustering of gene expression responses. 

This paper is organized as follows. In Section|II]we present the biological problem that motivates OPFA 
and introduce our mathematical model. In Section JIIJ we formulate the non-convex optimization problem 
associated with the fitting of our model and give a simple local optimization algorithm. In Section JV] 
we apply our methodology to both synthetic data and real gene expression data. Finally we conclude in 
Section [V] For lack of space many technical details are left out of our presentation but are available in 
the accompanying technical report [19]. 

II. Motivation: gene expression time-course data 

In this section we motivate the OPFA mathematical model in the context of gene expression time- 
course analysis. Temporal profiles of gene expression often exhibit motifs that correspond to cascades 
of up-regulation/down-regulation patterns. For example, in a study of a person's host immune response 
after inoculation with a certain pathogen, one would expect genes related to immune response to exhibit 
consistent patterns of activation across pathogens, persons, and environmental conditions. 

A simple approach to characterize the response patterns is to encode them as sequences of a few basic 
motifs such as (see, for instance, [20]): 

• Up-regulation: Gene expression changes from low to high. 

• Down-regulation: Gene expression changes from a high to a low level. 

• Steady: Gene expression does not vary. 

If gene expression is coherent over the population of several individuals, e.g., in response to a common 
viral insult, the response patterns can be expected to show some degree of consistency across subjects. 
Human immune system response is a highly evolved system in which several biological pathways are 



6 




Fig. 2. Example of gene patterns with a consistent precedence-order across 3 subjects. The down-regulation motif of gene 
CD1C precedes the peak motif of gene ORM1 across these three subjects. 



recruited and organized over time. Some of these pathways will be composed of genes whose expressions 
obey a precedence-ordering, e.g., virally induced ribosomal protein production may precede toll-like 
receptor activation and antigen presentation ll2T1 . This consistency exists despite temporal misalignment: 
even though the order is preserved, the specific timing of these events can vary across the individuals. 
For instance, two different persons can have different inflammatory response times, perhaps due to a 
slower immune system in one of the subjects. This precedence-ordering of motifs in the sequence of 
immune system response events is invariant to time shifts that preserve the ordering. Thus if a motif in 
one gene precedes another motif in another gene for a few subjects, we might expect the same precedence 
relationship to hold for all other subjects. Figure |2]shows two genes from [11] whose motif precedence- 
order is conserved across 3 different subjects. This conservation of order allows one to impose ordering 
constraints on without actually knowing the particular order or the particular factors that obey the 
order-preserving property. 

Often genes are co-regulated or co-expressed and have highly correlated expression profiles. This can 
happen, for example, when the genes belong to the same signaling pathway. Figure [3] shows a set of 
different genes that exhibit a similar expression pattern (up-regulation motif). The existence of high 
correlation between large groups of genes allows one to impose a low rank property on the factorization 
in ©. 

In summary, our OPFA model is based on the following assumptions: 

• Al: Motif consistency across subjects: Gene expression patterns have consistent (though not-necessarily 
time aligned) motifs across subjects undergoing a similar treatment. 
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Fig. 3. Example of gene patterns exhibiting co-expression for a particular subject in the viral challenge study in [11], 

• A2: Motif sequence consistency across subjects: If motif X precedes motif Y for subject s, the same 
precedence must hold for subject t ^ s. 

• A3: Motif consistency across groups of genes: There are (not necessarily known) groups of genes 
that exhibit the same temporal expression patterns for a given subject. 

• A4: Gene Expression data is non-negative: Gene expression on a microarray is measured as an 
abundance and standard normalization procedures, such as RMA ll22l . preserve the non-negativity 
of this measurement. 

A few microarray normalization software packages produce gene expression scores that do not satisfy the 
non-negativity assumption A4. In such cases, the non-negativity constraint in the algorithm implementing 
© can be disabled. Note that in general, only a subset of genes may satisfy assumptions A1-A3. 

III. OPFA MATHEMATICAL MODEL 

In the OPFA model, each of the S observations is represented by a linear combination of temporally 
aligned factors. Each observation is of dimension n x p, where n is the number of time points and p is 
the number of genes under consideration. Let F be an n x / matrix whose columns are the / common 
alignable factors, and let M(F,d) be a matrix valued function that applies a circular shift to each 
column of F according to the vector of shift parameters d, as depicted in Figure [12] Then, we can refine 
model © by restricting M s to have the form: 

M s = M(F,d s ). (3) 



s 



r 



M(F,d) 



Hi 



I 



Fig. 4. Each subject's factor matrix M s is obtained by applying a circular shift to a common set of factors F parameterized 
by a vector d. 



where d s € {0, • • • , d max } and d max < n is the maximum shift allowed in our model. This model is a 
generalization of a simpler one that restricts all factors to be aligned but with a common delay: 

M s = U S F, (4) 

where U s is a circular shift operator. Specifically, the fundamental characteristic of our model (|3) is that 
each column can have a different delay, whereas dU) is a restriction of (0) with df = d| for all s and all 

i, 3- 

The circular shift is not restrictive. By embedding the observation into a larger time window it can 
accommodate transient gene expression profiles in addition to periodic ones, e.g., circadian rhythms [19 1. 
There are several ways to do this embedding. One way is to simply extrapolate the windowed, transient 
data to a larger number of time points np = n + d mSbX . This is the strategy we follow in the numerical 
experiments of Section IV-B. 

This alignable factor model parameterizes each observation's intrinsic temporal dynamics through the 
/-dimensional vector d s . The precedence-ordering constraint A2 is enforced by imposing the condition 

4 1 < d£ & <f < df 2 Vs 2 + 8 U (5) 

that is, if factor j\ precedes factor j2 in subject s\, then the same ordering will hold in all other subjects. 
Since the indexing of the factors is arbitrary, we can assume without loss of generality that df < df +1 
for all i and all s. This characterization constrains each observation's delays d s independently, allowing 
for a computationally efficient algorithm for fitting model ©. 
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A. Relationship to 3-way factor models. 

Our proposed OPFA framework is significantly different from other factor analysis methods and these 
differences are illustrated in the simulated performance comparisons below. However, there are some 
similarities, especially to 3-way factor models |Q3], ||23l that are worth pointing out. 

An n-th order tensor or n-way array is a data structure whose elements are indexed by an n-tuple 
of indices ll23l . n-way arrays can be seen as multidimensional generalizations of vectors and matrices: 
an 1-way array is a vector and a 2-way array is a matrix. Thus, we can view our observations X s as 
the slices of a third order tensor X of dimension p x n x S: X s = Af V) j. Tensor decompositions aim at 
extending the ideas of matrix (second order arrays) factorizations to higher order arrays [1151 . ll23l and 
have found many applications in signal processing and elsewhere ||23l , |[T5l , |[T6l , |[T2l . Il24l . Since our 
data tensor is of order 3, we will only consider here 3-way decompositions, which typically take the 
following general form: 

P Q R 

^-i,j,k = ^ ^ ^ GpqrFipBj q Ckr (6) 
p=l q=l r=l 

where P, Q, R are the number of columns in each of the factor matrices F, B, C and Qis&PxQ~x.R 
tensor. This class of decompositions is known as the Tucker model. When orthogonality is enforced 
among F, B, C and different matrix slices of Q, one obtains the Higher Order SVD [16|. When Q is a 
superdiagonal tensoiQ and P = Q = R, this model amounts to the PARAFAC/Canonical Decomposition 
(CP) model O, H3, (H. The PARAFAC model is the closest to OPFA. Under this model, the slices 
of Xi j k can be written as: 

X s CP = Fdiag(C, s )B'. (7) 

This expression is to be compared with our OPFA model, which we state again here for convenience: 

x OPFA = M d s^ As _ (8) 

Essentially, d7) shows that the PARAFAC decomposition is a special case of the OPFA model ([8]) 
where the factors are fixed (M s = F) and the scores only vary in magnitude across observations 
(A s = diag (C. )S ) B'). This structure enhances uniqueness (under some conditions concerning the linear 
independence of the vectors in F, B, C, see lfT31 ) but lacks the additional flexibility necessary to model 
possible translations in the columns of the factor matrix F. If d s = for all s, then the OPFA ([8]) and 

[ Q is superdiagonal tensor when C/yj. = except for i — j — k. 
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the Linear Factor model ([T) also coincide. The OPFA model can be therefore seen as an extension of 
the Linear Factor and PARAFAC models where the factors are allowed to experiment order-preserving 
circular translations across different individuals. 

B. OPFA as an optimization problem 

OPFA tries to fit the model ©-(T5]) to the data {-Xgj-fLj. For this purpose, we define the following 
penalized and constrained least squares problem: 

min £f =1 ||X s -M(F,d s )A s |||, + APi(Ai,--- ,A s )+ftP 2 (F) (9) 
s.t. {d s } s eK,Fe/,A s eA 

where \\-\\ F is the Frobenius norm, A and ft are regularization parameters, and the set K, constrains the 
delays d s to be order-preserving: 

K = {d e {0, ■ ■ ■ ,d max } f : di+i > di,Vi} . (10) 

where (i max < n. The other soft and hard constraints are briefly described as follows. 

For the gene expression application we wish to extract factors F that are smooth over time and non- 
negative. Smoothness will be captured by the constraint that P%(F) is small where Pi{F) is the squared 
total variation operator 

/ 

P 2 (F) = J2\\WFJ\1 (11) 

i=i 

where W is an appropriate weighting matrix and F j denotes the i-th column of matrix F. From A4, the 
data is non-negative and hence non-negativity is enforced on F and the loadings A s to avoid masking of 
positive and negative valued factors whose overall contribution sums to zero. To avoid numerical instability 
associated with the scale invariance MA = ^MaA for any a > 0, we constrain the Frobenius norm 
of F. This leads to the following constraint sets: 

T= [F g R" x/ : ||F|| F < ^} (12) 
A s = R f + Xp ,s = l,--- ,S 

The parameter 5 above will be fixed to a positive value as its puipose is purely computational and 
has little practical impact. Since the factors F are common to all subjects, assumption A3 requires that 
the number of columns of F (and therefore, its rank) is small compared to the number of genes p. In 
order to enforce Al we consider two different models. In the first model, which we shall name OPFA, 
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we constrain the columns of A s to be sparse and the sparsity pattern to be consistent across different 
subjects. Notice that Al does not imply that the mixing weights A s are the same for all subjects as this 
would not accommodate magnitude variability across subjects. We also consider a more restrictive model 
where we constrain A\ = • • • = A$ = A with sparse A and we call this model OPFA-C, the C standing 
for the additional constraint that the subjects share the same sequence A of mixing weights. The OPFA-C 
model has a smaller number of parameters than OPFA, possibly at the expense of introducing bias with 
respect to the unconstrained model. A similar constraint has been succesfully adopted in |[25l in a factor 
model for multi-view learning. 

Similarly to the approach taken in [26 1 in the context of simultaneous sparse coding, the common 
sparsity pattern for OPFA is enforced by constraining P\(A\,--- , A$) to be small, where P\ is a 
mixed-norm group-Lasso type penalty function [27 1. For each of the p x / score variables, we create a 
group containing its S different values across subjects: 

v f 

p 1 (a 1 ,...,a s ) = J2T,\\ [ A 4* • • • [M-,* ll* ( 13 ) 

i=i j=i 

Table [TT] summarizes the constraints of each of the models considered in this paper. 

Following common practice in factor analysis, the non-convex problem (0 is addressed using Block 
Coordinate Descent, displayed in the figure labeled Algorithm [T] which iteratively minimizes © with 
respect to the shift parameters {<i s }f =1 , the scores {A s }f =1 and the factors F while keeping the other 
variables fixed. This algorithm is guaranteed to monotonically decrease the objective function at each 
iteration. Since both the Frobenius norm and Pi (•), P% (■) are non-negative functions, this ensures that 
the algorithm converges to a (possibly local) minima or a saddle point of I©. 

The subroutines EstimateFactors and EstimateScores solve the following penalized regression problems: 

mm Es=i\\Xs-M(F,d s )A s \\ 2 F + PZ{=i\\WF.£ 2 ( 14 ) 

F 

' \\F\\%<& 
s.t. I F itj >0 i = l,---,n, 
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Algorithm 1: BCD algorithm for finding a local minima of d9j. 
OPFAObjective \ F, {A s }f =1 , {d s }f =1 ) denotes the objective function in © 

Input: Initial estimate of F and {A s } s=1 , e, A, (3. 

Output: F, {A s } s s=1 , {d s } s s=1 

c° = oo 

c 1 <- OPFAObjective (f, {A s } s s=1 , {d s }f =1 ) 
t = 1 

while c* 1 — c* > e do 

Wf=i <- EstimateDelays (f, {A s }f =1 ) 
{A s }f =1 <- EstimateScores (f, {d s }^ =1 ^j 
F EstimateFactors ({A s }f =1 , {d s }f =1 ) 
c* <- OPFAObjective (V, {A s }f =1 , K}f =1 ) 



and 

r m ^ Ef=i II** - M (F, d°) A S \\ 2 F + A £? =1 Ej=i II [^^ • • • [A 8 ] jti h (15) 

[A a ] jti > i = I,-- - ,n, 

i = i,---,/, (16) 

s = 1, • • • ,5 

Notice that in OPFA-C, we also incorporate the constraint A\ = • • • = Ag in the optimization problem 
above. The first is a convex quadratic problem with a quadratic and a linear constraint over a domain of 
dimension fn. In the applications considered here, both n and / are small and hence this problem can be 
solved using any standard convex optimization solver. EstimateScores is trickier because it involves a non- 
differentiable convex penalty and the dimension of its domain is equal td^ Sfp, where p can be very large. 
In our implementation, we use an efficient first-order method [28 ] designed for convex problems involving 
a quadratic term, a non-smooth penalty and a separable constraint set. These procedures are described in 
more detail in Appendix O and therefore we focus on the EstimateDelays subroutine. EstimateDelays is 

2 This refers to the OPFA model. In the OPFA-C model, the additional constraint Ai — ■ ■ ■ — As = A reduces the dimension 
to fp. 




13 



a discrete optimization that is solved using a branch-and-bound (BB) approach ||29l . In this approach a 
binary tree is created by recursively dividing the feasible set into subsets ("branch"). On each of the nodes 
of the tree lower and upper bounds ("bound") are computed. When a candidate subset is found whose 
upper bound is less than the smallest lower bound of previously considered subsets these latter subsets 
can be eliminated ("prune") as candidate minimizers. Whenever a leaf (singleton subset) is obtained, the 
objective is evaluated at the corresponding point. If its value exceeds the current optimal value, the leaf 
is rejected as a candidate minimizer, otherwise the optimal value is updated and the leaf included in the 
list of candidate minimizers. Details on the application of BB to OPFA are given below. 
The subroutine EstimateDelays solves S uncoupled problems of the form: 

mm\\X s -M(F,d)A s \\ 2 F , (17) 

daK, 

where the set K is defined in (fTOb . The "branch" part of the optimization is accomplished by recursive 
splitting of the set K to form a binary tree. The recursion is initialized by setting S Q = {0, • • • , d mSiX } * , 
I = {d £ K n S a } . The splitting of the set Z into two subsets is done as follows 

11 = {de!CnS :d Wl < 7l } (18) 

1 2 = {d £ ICnS : d Ul > 71}, 

and we update S\ = {d £ S Q : d Wl < 71}, S2 = {d £ S Q : d Wl > 71}. Here 71 is an integer < 71 < 
dmax, and oj\ £ {1, • ■ • , /}. X\ contains the elements d G K whose ^i-th component is strictly larger 
than 71 and Z2 contains the elements whose 07 -fh component is smaller than 71. The same kind of 
splitting procedure is then subsequently applied to X\, Z2 and its resulting subsets. After k— 1 successive 
applications of this decomposition there will be 2 k ~ l subsets and the A;-th split will be : 

it-- {de/cnM (19) 
X m := {de/cns m } 

where 

S t = {d £ : d Uk < 7 fc } 

S t +i = {d £ S^ : d Wk > 7 fc } . 

and 7Tfc £ {l, • • • , 2 fc_1 } denotes the parent set of the two new sets t and t + 1, i.e. pa(i) = ir^ and 
pa(t + 1) = 7Tfc. In our implementation the splitting coordinate 0Jk is the one corresponding to the 
coordinate in the set l 7Tk with largest interval. The decision point 7^ is taken to be the middle point of 
this interval. 
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The "bound" part of the optimization is as follows. Denote g (d) the objective function in (fTTT ) and 
define its minimum over the set Zt C K,: 



5min (It) = mhig (d) . 
dex t 

A lower bound for this value can be obtained by relaxing the constraint d G K, in dT9b : 



min g(d) < g min (I t ) 
deSt 

it 



(20) 



(21) 



Letting X s = + X" where X" = X S A]A S and X^~ = X s (I — A\A S ), we have: 

2 
F 

|| . . 9 

+ Xs 



\X s -M(F,d)A s \\ 2 F = 1 1 (x s At — M (F, d)j A 



2 

If 



where At denotes the pseudoinverse of A s . This leads to: 

2 



\{A S AT) X s A\-M(F,d) 



+ 



<g(d), 



(22) 



where A (A s A^) denotes the smallest eigenvalue of the symmetric matrix A s Aj. Combining the 
relaxation in (l2Tb with inequality (|22l . we obtain a lower bound on <7 m i n (It): 

2 
F 

, , ,9 



<T»„, (J/) =niin t/ ^..A(A..Ai) X s A+-M(F,d) 

HI 2 



(23) 



which can be evaluated by performing / decoupled discrete grid searches. At the k-th step, the splitting 
node 7Tfc will be chosen as the one with smallest ^u, (I t )- Finally, this lower bound is complemented by 
the upper bound 



Smin (It) < ®ub {It) = g (d) for Vd € If 
These bounds enable the branch-and-bound optimization of (fTTT ). 



(24) 



C. Selection of the tuning parameters f, A and (3 

From (©, it is clear that the OPFA factorization depends on the choice of /, A and j3. This is a 
paramount problem in unsupervised learning, and several heuristic approaches have been devised for 
simpler factorization models 11301 . ED, J9]. These approaches are based on training the factorization 
model on a subset of the elements of the data matrix (training set) to subsequently validate it on the 
excluded elements (test set). 
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Fig. 5. Dictionary used to generated the 2-factor synthetic data of Section ITVl 



The variational characterization of the OPFA decomposition allows for the presence of missing vari- 
ables, i.e. missing elements in the observed matrices {.X" s } s=1 . In such case, the Least Squares fitting 
term in © is only applied to the observed set of indices^. We will hence follow the approach in Q and 
train the OPFA model over a fraction 1 — S of the entries in the observations X s . Let U s denote the set 
of 6 (n X p) excluded entries for the s-th observation. These entries will constitute our test set, and thus 
our Cross- Validation error measure is: 

2 

n s 

S=L 

where F, ' are ^ e O^A estimates obtained on the training set excluding the entries 

in {fi s } s=1 , for a given choice of /, A, and f3. 



1 S 

CV(/,A,/3) = -J] [x s -m(fJ s )A 

s=l 



IV. Numerical results 

A. Synthetic data: Periodic model 

First we evaluate the performance of the OPFA algorithm for a periodic model observed in additive 



Gaussian white noise: 



X s = M(F,d s ) A s + e s s = l,..., 5. 



(25) 



Here e s ~ M nxp (0, 07 i"), d s = sort (t s ) where 07 is the variance of e s and t s ^0, y 12cr^ + 1J are 
i.i.d. The / = 2 columns of F are non-random smooth signals from the predefined dictionary shown in 
Figure [5] The scores A s are generated according to a consistent sparsity pattern across all subjects and 
its non zero elements are i.i.d. normal truncated to the non-negative orfhant. 



3 See the Appendix Icl and 151 for the extension of the EstimateFactors, EstimateScores and Estimatedelays procedures to the 
case where there exist missing observations. 
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TABLE II 

Models considered in Section IV-A. 



Model 


M s 


As 


OPFA 


M s = M(F,d s ) 
d s e K, F smooth 
and non-negative 


Non-negative 
sparse A s 


OPFA-C 


Ms = M(F,d s ) 
d s e K, F smooth 
and non-negative 


Non-negative 
sparse 
Ai = ■ ■ ■ = As 


SFA 


Ms =M(F,d s ), 
d s = ,F smooth 
and non-negative 


Non-negative 
sparse A s 



Here the number of subjects is S = 10, the number of variables is p = 100, and the number of 
time points is n = 20. In these experiments experiments we choose to initialize the factors F with 
temporal profiles obtained by hierarchical clustering of the data. Hierarchical clustering ||32~I is a standard 
unsupervised learning technique that groups the p variables into increasingly finer partitions according 
to the normalized euclidean distance of their temporal profiles. The average expression patterns of the 
clusters found are used as initial estimates for F. The loadings {A s }f =1 are initialized by regressing the 
obtained factors onto the data. 

We compare OPFA and OPFA-C to a standard Sparse Factor Analysis (SFA) solution, obtained by 
imposing <i max = in the original OPFA model. Table [TT] summarizes the characteristics of the three 
models considered in the simulations. We fix / = 2 and choose the tuning parameters (A, f3) using the 
Cross-Validation procedure of Section ITlI-CI with a 5 x 3 grid and 6 = .1. 

In these experiments, we consider two measures of performance, the Mean Square Error (MSE) with 
respect to the generated data: 

MSE:=^J2E\\D a -D a * , 

b s=i 1 F 

where E is the expectation operator, D s = M(F,d s )A s is the generated noiseless data and D s = 
M (f, d s ^j A s is the estimated data, and the Distance to the True Factors (DTF), defined as: 

1 ' „ E T :F i 



DTF := 1 V E — 

f ^ II F || 

•> i=i ||-r.,i|| 2 



F ;i 



2 

where F, F are the generated and the estimated factor matrices, respectively. 
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Figure [6] shows the estimated MSE and DTF performance curves as a function of the delay variance 



aj for fixed SNR= 15dB (which is defined as SNR = 10 log ( ^ E s ^ w^ 3 '^ ) )■ 0PFA and 



OPFA-C perform at least as well as SFA for zero delay (ad = 0) and significantly better for ad > 
in terms of DTF OPFA-C outperforms OPFA for high delay variances a\ at the price of a larger MSE 
due to the bias introduced by the constraint A\ = ■ ■ ■ = As- In Figure [7] the performance curves are 
plotted as a function of SNR, for fixed a\ = 5. Note that OPFA and OPFA-C outperform SFA in terms 
of DTF and that OPFA is better than the others in terms of MSE for SNR> Odb. Again, OPFA-C shows 
increased robustness to noise in terms of DTF 

We also performed simulations to demonstrate the value of imposing the order-preserving constraint 
in ( fTTT ). This was accomplished by comparing OPFA to a version of OPFA for which the constraints in 
(fTTT ) are not enforced. Data was generated according to the model (|25T ) with S = 4, n = 20, / = 2, 
and a 2 d = 5. The results of our simulations (not shown) were that, while the order-preserving constraints 
never degrade OPFA performance, the constraints improve performance when the SNR is small (below 
3dB for this example). 

Finally, we conclude this sub-section by studying the sensitivity of the final OPFA estimates with 
respect to the initialization choice. To this end, we initialize the OPFA algorithm with the correct model 
perturbed with a random gaussian vector of increasing variance. We analyze the performance of the 
estimates in terms of MSE and DTF as a function of the norm of the model perturbation relative to 
the norm of the noiseless data, which we denote by p. Notice that larger p corresponds to increasingly 
random initialization. The results in Table [HI] show that the MSE and DTF of the OPFA estimates are 
very similar for a large range of values of p, and therefore are robust to the initialization. 

B. Experimental data: Predictive Health and Disease (PHD) 

The PHD data set was collected as part of a viral challenge study that is described in ifTTl . In this 
study 20 human subjects were inoculated with live H3N2 virus and Genechip mRNA gene expression in 
peripheral blood of each subject was measured over 16 time points. The raw Genechip array data was 
pre-processed using robust multi-array analysis [22 ] with quantile normalization ||33l . In this section we 
show results for the constrained OPFA model (OPFA-C). While not shown here, we have observed that 
OPFA-C gives very similar results to unconstrained OPFA but with reduced computation time. 

Specifically, we use OPFA-C to perform the following tasks: 

1) Subject Alignment: Determine the alignment of the factors to fit each subject's response, therefore 
revealing each subject's intrinsic response delays. 
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TABLE III 

Sensitivity of the OPFA estimates to the initialization choice with respect to the relative 

NORM OF THE PERTURBATION (p). 





DTF [ mean (standard deviation) ] X 10 3 


SNR 


p = 0.002 


p = 1.08 


p = 53.94 


22.8 


0.0 (0.0) 


3.4 (9.4) 


1.9 (3.2) 


-2 


1.3 (0.5) 


1 (9.4) 


1.25 (1.5) 


-27.1 


46 (20) 


58 (17) 


63 (8) 




MSE [ mean (standard deviation xlO 3 ) ] 


SNR 


p = 0.002 


p = 1.08 


p = 53.94 


22.8 


0.02 (1.5) 


0.05 (69) 


0.11 (99) 


-2 


0.35 (7.9) 


0.36(22) 


0.38 (32) 


-27.1 


0.96 (19) 


0.99 (18) 


1.00 (24) 




Fig. 6. MSE (top) and DTF (bottom) as a function of delay variance g\ for OPFA and Sparse Factor Analysis (SFA). These 
curves are plotted with 95% confidence intervals. For a 2 d > 0, OPFA outperforms SFA both in MSE and DTF, maintaining its 
advantage as ad increases. For large ad, OPFA-C outperforms the other two. 



2) Gene Clustering: Discover groups of genes with common expression signature by clustering in 
the low-dimensional space spanned by the aligned factors. Since we are using the OPFA-C model, 
the projection of each subject's data on this lower dimensional space is given by the scores A := 
A\ = ■ ■ ■ = As- Genes with similar scores will have similar expression signatures. 

3) Symptomatic Gene Signature discovery: Using the gene clusters obtained in step 2 we construct 
temporal signatures common to subjects who became sick. 

The data was normalized by dividing each element of each data matrix by the sum of the elements in 
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Fig. 7. Same as Figure [6] except that the performance curves are plotted with respect to SNR for fixed a\ — 5. 

its column. Since the data is non-negative valued, this will ensure that the mixing weights of different 
subjects are within the same order of magnitude, which is necessary to respect the assumption that 
A\ = ■ ■ ■ = As in OPFA-C. In order to select a subset of strongly varying genes, we applied one-way 
Analysis-Of- Variance [34] to test for the equality of the mean of each gene at 4 different groups of time 
points, and selected the first p = 300 genes ranked according to the resulting F-statistic. To these gene 
trajectories we applied OPFA-C to the 5 = 9 symptomatic subjects in the study. In this context, the 
columns in F are the set of signals emitted by the common immune system response and the vector d s 
parameterizes each subject's characteristic onset times for the factors contained in the columns of F. To 
avoid wrap-around effects, we worked with a factor model of dimension n = 24 in the temporal axis. 

The OPFA-C algorithm was run with 4 random initializations and retained the solution yielding the 
minimum of the objective function (6). For each / = 1, • • • ,5 (number of factors), we estimated the 
tuning parameters (A,/3) following the Cross-Validation approach described in IIII-CI over a 10 x 3 grid. 
The resulting results, shown in Table HVl resulted in selecting j3 = 1 X 10~ 8 , A = 1 x 10 -8 and / = 3. The 
choice of three factors is also consistent with an expectation that the principal gene trajectories over the 
period of time studied are a linear combination of increasing, decreasing or constant expression patterns 



To illustrate the goodness-of-fit of our model, we plot in Figure [8] the observed gene expression patterns 
of 13 strongly varying genes and compare them to the OPFA-C fitted response for three of the subjects, 
together with the relative residual error. The average relative residual error is below 10% and the plots 
demonstrate the agreement between the observed and the fitted patterns. Figure [9] shows the trajectories 



[11]. 
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TABLE IV 

Cross Validation Results for Section HV-BI 





f = l 


/ = 2 


f = 3 


/ = 4 


/ = 5 


minCV(/,A,/3) 


20.25 


13.66 


12.66 


12.75 


12.72 


Relative residual 
error (xlCT 3 ) 


7.2 


4.8 


4.5 


4.5 


4.4 


A(xl0" 8 ) 


5.99 


1 


1 


1 


35.9 


p (xicn 6 ) 


3.16 


3.16 


0.01 


0.01 


100 




Fig. 8. Comparison of observed (O) and fitted responses (R) for three of the subjects and a subset of genes in the PHD data 
set. Gene expression profiles for all subjects were reconstructed with a relative residual error below 10%. The trajectories are 
smoothed while respecting each subject's intrinsic delay. 



for each subject for four genes having different regulation motifs: up-regulation and down-regulation. It 
is clear that the gene trajectories have been smoothed while conserving their temporal pattern and their 
precedence-order, e.g. the up-regulation of gene OAS1 consistently follows the down-regulation of gene 
ORM1. 

In Figure [10] we show the 3 factors along with the factor delays and factor loading discovered by OPFA- 
C. The three factors, shown in the three bottom panels of the figure, exhibit features of three different 
motifs: factor 1 and factor 3 correspond to up-regulation motifs occurring at different times; and factor 2 
is a strong down-regulation motif. The three top panels show the onset times of each motif as compared 
to the clinically determined peak symptom onset time. Note, for example, that the strong up-regulation 
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Fig. 9. Comparison of observed (O) and fitted responses (R) for four genes (OAS1, CCR1, CX3CR1, ORM1) showing up- 
regulation and down-regulation motifs and three subjects in the PHD dataset. The gene trajectories have been smoothed while 
conserving their temporal pattern and their precedence-order. The OPFA-C model revealed that OAS1 up-regulation occurs 
consistently after ORM1 down-regulation among all subjects. 
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Fig. 10. Top plots: Motif onset time for each factor (□) and peak symptom time reported by expert clinicians (O). Bottom 
plots: Aligned factors for each subject. Factor 1 and 3 can be interpreted as up-regulation motifs and factor 2 is a strong 
down-regulation pattern. The arrows show each factor's motif onset time. 



pattern of the first factor coincides closely with the onset peak time. Genes strongly associated to this 
factor have been closely associated to acute anti- viral and inflammatory host response [11]. Interestingly, 
the down-regulation motifs associated with factor 2 consistently precedes this up-regulation motif. 

Finally, we consider the application of OPFA as a pre-processing step preceding a clustering analysis. 
Here the goal is to find groups of genes that share similar expression patterns and determine their 
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characteristic expression patterns. In order to obtain gene clusters, we perform hierarchical clustering on 

S *7 

the raw data ({-Xs} s=1 ) and on the lower dimensional space of the estimated factor scores ( {A s }* =1 ), 
obtaining two different sets of 4 well-differentiated clusters. We then compute the average expression 
signatures of the genes in each cluster by averaging over the observed data ({-X" s } s=1 ) and averaging over 
the data after OPFA correction for the temporal misalignments. Figure [TT] illustrates the results. Clustering 
using the OPFA-C factor scores produces a very significant improvement in cluster concentration as 
compared to clustering using the raw data {X s } s=l . The first two columns in Figure compare the variation 
of the gene profiles over each cluster for the temporally realigned data (labeled V A') as compared to to 
the profile variation of these same genes for the misaligned observed data (labeled V M')- For comparison, 
the last column shows the results of applying hierarchical clustering directly to the original misaligned 
dataset {X s } s=l . It is clear that clustering on the low-dimensional space of the OPFA-C scores unveils 
interesting motifs from the original noisy temporal expression trajectories. 

V. Conclusions 

We have proposed a general method of order-preserving factor analysis that accounts for possible 
temporal misalignments in a population of subjects undergoing a common treatment. We have described 
a simple model based on circular-shift translations of prototype motifs and have shown how to embed 
transient gene expression time courses into this periodic model. The OPFA model can significantly 
improve interpretability of complex misaligned data. The method is applicable to other signal processing 
areas beyond gene expression time course analysis. 

A Matlab package implementing OPFA and OPFA-C will be available at the Hero Group Reproducible 
Research page (http://tbayes.eecs.umich.edu). 

Appendix 

A. Circulant time shift model 

Using circular shifts in © introduces periodicity into our model ©. Some types of gene expression 
may display periodicity, e.g. circadian transcripts, while others, e.g. transient host response, may not. For 
transient gene expression profiles such as the ones we are interested in here, we use a truncated version 
of this periodic model, where we assume that each subject's response arises from the observation of a 
longer periodic vector within a time window (see Figure [12): 

X s = [M(F,d s )A s + e s ] n . (26) 
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Fig. 11. The first two columns show the average expression signatures and their estimated upper/lower confidence intervals for 
each cluster of genes obtained by: averaging the estimated Aligned expression patterns over the S — 9 subjects (A) and directly 
averaging the misaligned observed data for each of the gene clusters obtained from the OPFA-C scores (M). The confidence 
intervals are computed according to +/— the estimated standard deviation at each time point. The cluster average standard 
deviation (<r) is computed as the average of the standard deviations at each time point. The last column shows the results of 
applying hierarchical clustering directly to the original misaligned dataset {X s }f =1 . In the first column, each gene expression 
pattern is obtained by mixing the estimated aligned factors F according to the estimated scores A. The alignment effect is clear, 
and interesting motifs become more evident. 



Here, the factors are of dimension np > n and the window size is of dimension n (in the special case 
that n = njp, we have the original periodic model). In this model, the observed features are non-periodic 
as long as the delays d s are sufficiently small as compared to np. More concretely, if the maximum 
delay is d max , then in order to avoid wrap-around effects the dimension should be chosen as at least 
np = n + d max . Finally, we define the index set Q corresponding to the observation window as: 

n = {uj 1 ,^ 1 + lV + 2...,w 2 } P (27) 
where oj 1 and J 2 are the lower and upper observation endpoints, verifying n = uo 2 — oA 
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M(F,d) 




Fig. 12. Right: Each subject's factor matrix Mi is obtained by applying a circular shift to a common set of factors F 
parameterized by a vector d. Left: In order to avoid wrap-around effects when modeling transient responses, we consider 
instead a higher dimensional truncated model of dimension Uf from which we only observe the elements within the window 
characterized by Q. 



B. Delay estimation and time-course alignment 

The solution to problem © yields an estimate d s for each subject's intrinsic factor delays. These 
delays are relative to the patterns found in the estimated factors and therefore require conversion to a 
common reference time. 

For a given up-regulation or down-regulation motif, /, which we call the feature of interest, found in 
factor g, we choose a time point of interest tj. See Figure [13] (a) for an example of choice of ti for an 
up-regulation feature. 

Then, given ti and for each subject s and each factor k, we define the absolute feature occurrence 
time as follows: 

t Sjk = (d s k + t^j mod np-oj 1 . (28) 

where d s g is the estimated delay corresponding to factor k and subject s and uo 1 is the lower endpoint of 
the observation window (see (127T)). Figure [T3l illustrates the computation of t s ^ in a 2-factor example. 

The quantities t Sj k can be used for interpretation purposes or to realign temporal profiles in order to 
find common, low-variance gene expression signatures, as shown in Section IIV-BI 

C. Implementation of Estimate Factors and EstimateScores 

We consider here the implementation of EstimateFactors and EstimateScores under the presence of 
missing data. Let Q s = [cuf, ■ ■ ■ ,ujp] € {0, i}" x p be the set of observed entries in observation X s . The 
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M(F,,d; 




(a) 




Fig. 13. (a) Time point of interest (ti) for the up-regulation feature of factor 1. (b) The absolute time points t\,\, ti,2 are 
shown in red font for two different subjects and have been computed according to their corresponding relative delays and the 
formula in l!28t. 



objective in © is then: 

s 

J2\\[X s -M(F,d s )A s ] Qa \\ 2 F (29) 

s=l 

We will show how to reformulate problems EstimateFactors (H~4b and EstimateScores (031 ) in a standard 
quadratic objective with linear and/or quadratic constraints. 
First, we rewrite the objective (|29l as: 



S P .. 2 



F 



£ £ I | dia § H) i x 'h - dia § K) M ( F ' dS ) 

8=1 J = l 

Expanding the square we obtain: 

S 

J2\\[X S -M (F, d s ) A s ] Qs \\ 2 F = Ef =1 EJ=i [As] T j M (F, d°f diag U) M (F, d°) [A B ].j 



-2 [X s ]^ diag [u>]) M (F, d s ) [A s ], tj 

+ Ef=i||[*JnX- (30) 



To obtain the EstimateFactors objective, first we will rewrite the OPFA model (1261 ) using matrix 
notation. Let U-i be a circular shift matrix U{ parameterized by the z-th delay d component. Then 

M (F, d) = [UiF l: ■■■ , UfF f ] = HF 
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where Fj denotes the j-th column of F, H is the concatenation of the U-i matrices and F is a matrix 
containing the columns of F with the appropriate padding of zeros. With this notation and (l30l ) we 



obtain: 

s 



J2\\[X.-M (F, d°) A s ] ns \\ F ot F Ef=i EU tr ([A.], j [A S ] T . F T Hjdi ag ( w j) H S F 



2tr(>i,;. ; ;x,; 7 ; diag U])ii,f 



tr (ZX T YW) = vec (W) 1 Z ® r T vec (X) , 



We now use the identity ( 11351 , Thm. 3 Sec. 4) 



-T 



to write: 

S 



J2\\[Xs-M(F,d s )A s 



oc F vec IF J (E s =i E?=i [^Lj [^1 j ® ^Jdiag f JEf J vec 



2vec f Ef =1 E?=i Hjdmg [ ) [X s ].. [A.]?j) vec 



Finally, making use of the fact that F is a block-column matrix with the columns of F padded by zeros, 
we conclude: 

s 

^2\\[X a -M (F, d s ) A s ] Us | \ 2 F <x F vec (F) T Q F vec (F) - 2^vec (F) 

8=1 

where we have defined 



Qi 



s P 



E E i A »ij i A *Z ® // ' dia ^ H) ^ 

8=1 J = l 



5 P 



vec ( £ £ i/Jdiag [X S ]. J [A^. 

8 = 1 j = l 



J, J 



J 



(31) 



(32) 



and J are the indices corresponding to the non-zero elements in vec (^F^j . Hence, EstimateFactors can 
be written as: 

min vec {F) T (Q F + /3diag ( W T W, ■ ■ ■ , W T W) ) vec (F) - 2q£vec (F) 



s.t. 



Fjj > i = 1, ■ ■ ■ ,n, 



The dimension of the variables in this problem is nf. In the applications considered here, both n and / 
are relatively small and hence this program can be solved with a standard convex solver such as SeDuMi 
||36l (upon conversion to a standard conic problem). 
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On the other hand, we can follow the same procedure to reformulate the objective in Estimates cores 

(031 ) into a penalized quadratic form. First we use (l30l l and (Oil) to write: 

s 

J2\\[Xs-M (F, d s ) A s ] Qs | \ 2 F x {A , YLi vec ( A ^) T ^Avec (A.) - 2^ T vec (A s ) 



8=1 



where 



M (F, d s Y diag (wf ) M (F, d s ) 













M (F, d s f diag M (F, d s 



9a 



[X s ]\ diag (ul)M(F,d s 



[X s ] T p di ag (u s p )M(F,d s ) 



Thus EstimateFactors can be written as 



mm Es=i vec (A s ) J Q^vec (A s ) - 2<ft T vec (A s ) + A ELl Ej=i II [^i] 



I' 2 



(33) 



(34) 



(35) 



s.t. 



\F\&<6 



i = l,-- ,n, 

i = i,--- ,/ 

This is a convex, non-differentiable and potentially high-dimensional problem. For this type of opti- 
mization problems, there exists a class of simple and scalable algorithms which has recently received 
much attention 11371 . IT381 . 11391 , |[28l . These algorithms rely only on first-order updates of the type: 



x 



Tr, A (« - 2a!*- 1 (al - Q)) , 



(36) 



which only involves matrix-vector multiplications and evaluation of the operator T, which is called the 
proximal operator [39 1 associated to T and C and is defined as: 

7t,a {v) ■= min 2 X ' X v x ^ ^ 
s.t. x eC. 

This operator takes the vector v as an input and outputs a shrunk/thresholded version of it depending on 
the nature of the penalty F and the constraint set C. For some classes of penalties T (e.g. l\, I2, mixed 
h — fa) an d the positivity constraints considered here, this operator has a closed form solution iPlOl . [39|. 
Weak convergence of the sequence (l36l ) to the optimum of (l35l l is assured for a suitable choice of the 
constant a HH, EE). 
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D. Delay Estimation lower bound in the presence of Missing Data 

As we mentioned earlier, the lower bound (l23l l does not hold anymore under the presence of missing 
data. We derive here another bound that can be used in such case. From expression (l30l l. we first obtain 
the objective in EstimateDelays ( fTTT ) in a quadratic form: 



s,3 



J2\\[X S -M (F, d s ) A s ] ns 1 1 J oc d , Ef =1 E?=i * [M (F, d°) T diag (ufj M (F, d s ) P t 

s=l 

-2tr (Q sJ M(F,d s )). 

Where we have let P s j := [-A s ].j [A s ] T j and Q SJ - = [A s ].^- [X^- diag (ujj. Notice that each of the 
terms indexed by s is independent of the others. Using (|3TT >. we obtain 



[X 8 - M (F, d s ) A s ] Qs ||; oc vec (M (F, d s )) J £* =1 [P*j ® diag (^)j vec (M (F, d s )) 

T 



-2vec^=iQ^J vec(M(F,d s )). 
We now can minorize the function above by: 

\\[X S -M (F, d s ) A s ] Qs \\l > A (j2 P j=1 P Sjj ® diag (u;j)) vec (M (F, d s )) T vec (M (F, 

"2vec (E^ =1 Q^) T vec (M (F, cf)) + Ef=i 1 1 1 1 J. 

which leads to: 

1 1 [X s - M (F, d s ) A s ] ns \\ 2 F > A (Ef=i P.j ® diag (wj) ) ||F||| 

-2vec (E^ =1 Ql 3 ) T vec (M (F, d s )) + 1 1 [X s ] ns 1 1* . 
Using the relaxation in (l2~Tl ). we can now compute a lower bound (I t ) as 

^ (It) = A (E?=i ® diag [|F||^ + || [X.] n< ||* 

-2 max de5t vec (e?=i Q^) T vec (M (F, d s )) . 
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